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Abstract 

Background: Genomic selection uses dense single nucleotide polymorphisms (SNP) markers to predict breeding 
values, as compared to conventional evaluations which estimate polygenic effects based on phenotypic records 
and pedigree information. The objective of this study was to compare polygenic, genomic and combined 
polygenic-genomic models, including mixture models (labelled according to the percentage of genotyped SNP 
markers considered to have a substantial effect, ranging from 2.5% to 100%). The data consisted of phenotypes and 
SNP genotypes (10,946 SNPs) of 2,188 mice. Various growth, behavioural and physiological traits were selected for 
the analysis to reflect a wide range of heritabilities (0.10 to 0.74) and numbers of detected quantitative traits loci 
(QTL) (1 to 20) affecting those traits. The analysis included estimation of variance components and cross-validation 
within and between families. 

Results: Genomic selection showed a high predictive ability (PA) in comparison to traditional polygenic selection, 
especially for traits of moderate heritability and when cross-validation was between families. This occurred although 
the proportion of genomic variance of traits using genomic models was 22 to 33% smaller than using polygenic 
models. Using a 2.5% mixture genomic model, the proportion of genomic variance was 79% smaller relative to the 
polygenic model. Although the proportion of variance explained by the markers was reduced further when a 
smaller number of SNPs was assumed to have a substantial effect on the trait, PA of genomic selection for most 
traits was little affected. These low mixture percentages resulted in improved estimates of single SNP effects. 
Genomic models implemented for traits with fewer QTLs showed even lower PA than the polygenic models. 

Conclusions: Genomic selection generally performed better than traditional polygenic selection, especially in the 
context of between family cross-validation. Reducing the number of markers considered to affect the trait did not 
significantly change PA for most traits, particularly in the case of within family cross-validation, but increased the 
number of markers found to be associated with QTLs. The underlying number of QTLs affecting the trait has an 
effect on PA, with a smaller number of QTLs resulting in lower PA using the genomic model compared to the 
polygenic model. 
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Background 

Recently, high-density single nucleotide polymorphism 
(SNP) arrays for a broad range of species have been devel- 
oped, including humans, mice, plant species such as barley, 
wheat or maize as well as major livestock species, such as 
cattle, pigs, sheep and chickens. In the past, selective breed- 
ing in plant and livestock species was based on phenotypic 
information combined with extensive pedigrees using best 
linear unbiased prediction. The use of high density SNP 
arrays opened the opportunity of using genomic informa- 
tion to estimate genomic breeding values for individuals [1]. 
Estimating a breeding value based on the genotype of an in- 
dividual may provide large benefits in situations where a 
species has a large generation interval e.g. oil palm [2] or 
when the trait of interest is recorded in one sex only e.g. 
milk production [3]. Other traits that may benefit from gen- 
omic selection are behavioural traits in animals, which are 
often costly and time consuming to measure routinely e.g. 
aggressiveness [4]. 

The high cost of genotyping, especially for the high 
density SNP arrays, limits the extent to which routine 
genotyping can be implemented in practice. Additionally, 
many of the SNPs contribute little to the genetic variance 
of a trait, as was found for example for human height vari- 
ation [5] or complex disease traits [6]. Moreover, statistical 
limitations can arise when the number of SNP effects 
exceeds by far the amount of phenotypic data available. 
For these reasons, there could be interest in reducing the 
number of SNPs while maintaining efficiency of selection. 
Costs of genotyping may be reduced by genotyping only 
part of the population, e.g. [7], or a two-step approach 
could be used to prioritize SNPs for genotyping with lower 
density SNP arrays, e.g. [8]. To circumvent the statistical 
limitations, many different approaches have been devel- 
oped to reduce the number of SNP effects to be estimated, 
e.g. [9,10]. 

The aim of this study was to assess the efficiency of gen- 
omic selection using mouse data and how it is affected by 
a) the heritability of the trait, b) the number of QTLs 
affecting the trait, c) the type of trait (classical' traits that 
are easily measurable versus behavioural traits) and d) the 
number of SNP markers in the model allowed to have a 
substantial effect. Various models are fitted (including 
polygenic and/or genomic effects), and cross-validation 
performance within and between families is compared. 

Results 

Variance components 

Tables 1, 2 and 3 show estimates of the total phenotypic 
variances, heritabilities based on polygenic effects, propor- 
tions of variances attributed to genomic effects relative to 
the phenotypic variance and of the phenotypic fractions of 
cage variances. Estimated variance components are based 
on the full dataset and are presented for seven models, 



namely: models (1), (2) and (3), and sub-models with 10% 
and 2.5% of the markers assumed to be associated with a 
substantial effect using models (2) and (3). Results based 
on sub-models using mixtures of 70%, 40%, 7.5% and 5% 
are not presented, because they showed the same trend as 
the 10% and 2.5% mixtures. 

Analyses of weight traits based on model (1), using poly- 
genic effects only, showed slightly lower heritabilities 
(Table 1) compared to those reported by Valdar et al. [11]. 
The differences are likely because of different fixed effects 
fitted in the models. For behavioural and physiological 
traits, Tables 2 and 3 show estimates of heritabilities of 
comparable magnitude to those reported by Valdar et al. 
[11]. Using model (1), phenotypic proportions of cage var- 
iances were low for the behavioural traits (4 to 7% of the 
total variance, Table 2) compared to weight and physio- 
logical traits (15 to 29%, Tables 1 and 3). 

Phenotypic proportions of genomic variances of weight, 
behavioural and physiological traits using genomic model 
(2) were 22 to 31%, 23 to 33% and 25 to 30% lower, re- 
spectively, than those using the polygenic model (1) 
(Tables 1 to 3). This was compensated for by an increase 
in variances attributed to the cage effects and/or error 
effects depending on trait. Using the 2.5% mixture in gen- 
omic model (2) - i.e. 2.5% of the genotyped SNP markers 
assumed to have a substantial effect on the trait - the 
phenotypic proportions of variance of genomic effects 
were 65 to 79%, 43 to 61% and 60 to 69% lower than esti- 
mates of the heritability from the polygenic model (1) for 
weight, behavioural and physiological traits, respectively. 
The underestimation of variances of genomic effects com- 
pared to variances of polygenic effects may be due to in- 
complete linkage disequilibrium between SNPs markers 
and causal variants, and due to low frequencies of these 
causal variants [12]. 

In model (3), additionally fitting polygenic effects essen- 
tially captured part of the genetic variance that was not 
accounted for by the genomic effects. The total variance 
attributed to genetic effects (polygenic and genomic) was 
in line with the polygenic variance found in model (1). 
Phenotypic proportions of the variance of the genomic 
effects using model (3) were consistently lower than in 
model (2). The phenotypic proportions of the variance of 
the genomic effects were 33 to 44%, 37 to 52% and 44 to 
50% lower for weight, behavioural and physiological traits, 
respectively, than the phenotypic proportions of the vari- 
ance of the polygenic effects obtained from model (1). For 
these traits, the phenotypic proportions of the variance of 
the polygenic effects accounted for 40 to 48%, 31 to 50% 
and 50 to 80%, respectively, of the heritability estimated 
using model (1). The use of different mixtures in model (3) 
resulted in a 75 to 92%, 60 to 70% and 75 to 80% decrease 
in phenotypic proportions of variance of the genomic 
effects compared to the corresponding proportions of 
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Table 1 Estimated variance components and heritabilities for weight traits 



Trait 1 


Model 2 


o 2 3 
u p 


a 2 4 

u c 


a 2 5 

u e 


h 2 6 

" u 


h 2 7 

" a 




(1) 


1 1 0.8 99.5-122.8 


31 .3 24.6-38.1 


21-1 9.4-32.5 


0-52 0.38-0.69 






(2) 100% 


1 1 7.1 108.1-126.1 


39.0 31.5-46.4 


35.8 32.4-39.3 




0-36 0.32-0.40 




(2) 10% 


1 04.7 96.3-1 13.6 


43.8 36.1-51.6 


43.8 40.0-47.5 




0.16 0.12-0.21 


W6 


(2) 2.5% 


104.8 96.1-113.4 


46.9 38.7-55.5 


46.3 42.5-49.9 




0-1 1 0.07-0.15 




(3) 100% 


1 1 9.0 107.9-129.6 


33.1 26.2-40.5 


22.5 13.9-30.5 


0-25 0.12-0.36 


0.29 0.25-0.33 




(3) 10% 


107.9 98.1-117.7 


32.7 25.9-39.6 


25.8 17.5-33.2 


0.33 0.21-0.46 


0.1 2 0.08-0.16 




(3) 2.5% 


108.8 99.0-118.8 


32.3 25.0-38.9 


24.2 15.7-32.4 


0-40 Q.28-0.53 


0.08 0.05-0.1 1 




(1) 


1 1 0.0 98.7-122.0 


31 -5 24.3-38.3 


22.2 11.2-33.8 


0-51 0.36-0.66 






(2) 100% 


1 1 8.0 108.3-126.6 


39.1 31.2-46.6 


36.1 32.7-39.7 




0-36 0.33-0.40 




(2) 10% 


1 03.7 95.1-1 12.3 


45.9 38.3-54.2 


46.1 42.3-49.8 




0-1 1 0.07-0.15 


W6m 


(2) 2.5% 


104.3 95.5-113.3 


49.3 40.8-57.9 


48.2 444-52.0 




0-06 0.03-0.10 




(3) 100% 


1 1 7.6 107.9-128.1 


33.8 26.8-41.3 


26.2 19.5-32.9 


0-20 0.10-0.31 


0.29 0.25-0.33 




(3) 10% 


104.8 96.5-114.7 


32.9 26.2-40.4 


28.7 21.6-36.5 


0-35 0.23-0.47 


0.06 0.03-0.10 




(3) 2.5% 


106.6 96.7-116.1 


32.2 25.4-39.3 


26.4 17.6-35.1 


0-41 0.27-0.54 


0.04 Q.01-0.07 




(D 


125.7 113.6-139.9 


19.0 12.8-25.4 


36.6 22.1-50.3 


0-55 0.40-0.71 






(2) 100% 


133.8 124.9-143.3 


28.2 21.2-35.0 


48.2 43.3-52.7 




0.43 0.40-0.47 




(2) 10% 


120.2 m.5-129.3 


3 1 .6 24.2-38.6 


58.9 54.0-64.5 




0.25 0.20-0.29 


W10 


(2) 2.5% 


1 1 9.7 1 1 1.0-128.7 


34.6 27.1-42.3 


62.7 57.7-68.0 




0-1 9 0.14-0.23 




(3) 100% 


138.3 126.8-150.9 


22.3 15.9-29.0 


34.1 24.1-43.9 


0.22 0.10-0.33 


0-37 0.33-0.41 




(3) 10% 


124.9 114.3-136.1 


21-2 15.0-27.4 


39.0 29.6-48.6 


0.32 0.20-0.45 


0.19 0 .15-0.25 




(3) 2.5% 


125.8 113.6-137.4 


20.3 14.7-26.7 


37.5 25.6-47.7 


0.40 Q.27-0.54 


0-14 0.09-0.18 



1. Trait: W6 = Weight at week 6; W6m = Weight at week 6, missing marker genotypes were treated as 3 rd allele; W10 = Weight at week 10; observations of all traits 
were multiplied by 10 2 . 

2. Model: (1) = polygenic; (2) = genomic; (3) = polygenic and genomic; with 100/10/2.5% of the markers allowed to have an effect. 

3. a 2 p : estimates of the total phenotypic variances. 

4. o 2 c : estimates of variances attributed to the cage effect. 

5. a 2 e : estimates of residual variances. 

6. h 2 u : heritability based on the polygenic effect. 

7. h 2 a : proportion of the variance attributed to the genomic effect. 

The 95%-highest posterior density intervals have been presented as subscript. 



polygenic effects from model (1) for the respective traits. 
The phenotypic proportions of the variance of the poly- 
genic effects for weight, behavioural and physiological 
traits accounted for 73 to 80%, 60 to 61% and 81 to 100%, 
respectively, of the phenotypic proportions of the variance 
of the polygenic effects from model (1). 

Using model (3), in general, a lower mixture percent- 
age led to a decrease in phenotypic proportions of the 
variance of the genomic effects and to an increase in the 
phenotypic proportions of variance of the polygenic 
effects in all traits. Comparing W6 and W6m, treating 
missing alleles as a separate 3 rd allele, resulted in small 
changes in proportions of the variance of the genomic 
effects with lower mixture percentages. 

Predictive ability 

Tables 4 and 5 show the average predictive abilities (PA) 
based on cross-validation within (W) or between families 
(B). PA was calculated using ten training and validation 
sets, and is shown for all three models and their sub- 



model using different mixtures. Within family cross-val- 
idation always performed substantially better, as expected 
because of the higher genetic connectedness between the 
training and validation dataset. Within family cross-val- 
idation resulted in little change between PAs for all 
models for most traits, with only W6 and TA showing 
an increase in PA using models (2) and (3) compared to 
model (1). 

In contrast, using between family cross-validation, 
model (1) resulted in substantially lower PA than models 
(2) and (3) for most traits. This was especially visible for 
traits with moderate to high heritabilities (e.g. for W6: 
PA of 0.15 vs. 0.24 or for TF: PA of -0.04 vs. 0.19 using 
model (1) and (2), respectively; Tables 4 and 5). For traits 
with low heritabilities there were little differences in PA 
between model (1) and the other two models. 

TA was the only trait to show significant differences in 
PA for within as well as between family cross-validation 
using model (2) with different mixtures. PA was stable at 
first with the low mixture models, but with mixtures 
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Table 2 Estimated variance components and heritabilities for behavioural traits 



Trait 1 


Model 2 


oV 




a 2 4 

u c 


a 2 5 

u e 


h 2 6 

" u 


h 2 7 

" a 




(1) 


61-1 56.3-66.1 


2-3 0.2-4.5 


37.6 32.0-43.0 


0-35 0.23-0.46 






(2) 100% 


60.0 56.4-63.6 


3.3 U-5.7 


40.5 37.3-43.8 




0-27 0.24-0.30 




(2) 10% 


60.7 56.6-64.8 


3.8 1.3-6.1 


41 .7 38.4-45.1 




0-25 0.21-0.30 


TA 


(2) 2.5% 


60.6 56. 


8-64.8 


5.0 2 .3-7.6 


43.5 40.1-46.9 




0-20 0.16-0.24 




(3) 100% 


59.7 55.6-63.7 


2.5 0.1-4.5 


37.8 33.5-42.3 


0-1 1 0.03-0.19 


0-22 0.18-0.25 




(3) 10% 


60.7 56.4-65.3 


2.5 0.2-4.5 


38.4 34.2-42.5 


0.1 2 0.04-0.20 


0.21 0.15-0.26 




(3) 2.5% 


61 -3 56.7-66.2 


2-3 0.3-4.5 


37.5 32.9-42.3 


0-21 0.10-0.31 


0-14 0.10-0.19 




(1) 


1243.4 


1 128.1-1351.9 


45.6 2.8-94.5 


790.0 655.5-924.6 


0-33 0.20-0.47 






(2) 100% 


1222.4 


1 132.2-1308.8 


82.8 26.4-143.5 


871 .3 787.1-951.8 




0-22 0.19-0.25 




(2) 10% 


1206.3 


1119.1-1303.4 


89.2 17.1-149.6 


905.7 822.5-993.2 




0-1 8 0.13-0.23 


TF 


(2) 2.5% 


1206.1 


1115.6-1304.2 


1 1 2.8 54.5-176.9 


937.9 852.0-1024.4 




0-1 3 0.08-0.17 




(3) 100% 


1233.0 


1 133.8-1341.0 


49.5 0.6-99.1 


794.2 672.4-918.1 


0.1 5 0.04-0.30 


0-16 0.13-0.19 




(3) 10% 


1240.0 


1 132.4-1353.1 


47.9 1.2-97.9 


809.6 683.6-929.0 


0-15 0.01-0.27 


0-16 0.10-0.22 




(3) 2.5% 


1251.4 


1 147.1-1369.7 


44.2 1.4-91.9 


807.8 695.4-925.8 


0-20 0.08-0.31 


0-1 2 0.07-0.17 




(D 


1289.9 


1 191.1-1382.1 


94.2 25.9-165.1 


1 066.2 961.3-1 168.9 


0.1 0 0.04-0.17 






(2) 100% 


1290.0 


1203.0-1382.9 


1 05.4 40.2-170.8 


1092.0 1000.8-1184.1 




0-07 0.06-0.09 




(2) 10% 


1286.0 


1200.1-1383.4 


1 1 1.1 49.6-177.6 


1 1 04.0 1007.0-1 196.4 




0.05 0.02-0.09 


FB 


(2) 2.5% 


1285.0 


1 197.5-1383.8 


1 16.2 53.1-182.8 


1114.0 1021.1-1211.1 




0-04 0.01-0.07 




(3) 100% 


1291.0 


1200.6-1385.5 


92.3 28.8-160.5 


1072.0 963.6-1167.8 


0-05 0.00-0.10 


0-05 0.04-0.07 




(3) 10% 


1295.0 


1202.9-1393.1 


92.5 24.2-156.6 


1 065.0 961.4-1 177.0 


0.07 0.00-0.14 


0.04 0.00-0.07 




(3) 2.5% 


1291.0 


1 199.4-1393.7 


97.3 29.6-165.6 


1 077.0 978.0-1 188.2 


0.06 0.01-0.13 


0-03 0.00-0.06 



1. Trait: TA = Total activity in open field test (observations were multiplied by 10~ 2 ); TF = Time freezing during cue; FB = Fecal boli after cue (observations were 
multipled by 10 1 ). 

2. Model: (1) = polygenic; (2) = genomic; (3) = polygenic and genomic; with 100/10/2.5% of the markers allowed to have an effect. 

3. a 2 p : estimates of the total phenotypic variances. 

4. o 2 c : estimates of variances attributed to the cage effect. 

5. a 2 e : estimates of residual variances. 

6. h 2 u : heritability based on the polygenic effect. 

7. h 2 a : proportion of the variance attributed to the genomic effect. 
The 95%-highest posterior density intervals have been presented as subscript. 

below 7.5%, PA decreased substantially compared to its 
highest value (0.27 vs. 0.35). W6 showed a similar pat- 
tern for between family cross-validation, with a drop-off 
for mixtures below 7.5% (from 0.27 to 0.20). Both W6 
and W10 showed a trend for a decrease in PA for within 
family cross-validation for mixtures below 7.5% in model 
(2). TA was the only trait to show a tendency for a re- 
duction in PA with a decrease in mixture percentage, for 
both within (from 0.43 to 0.41) and between family 
cross-validation (from 0.34 to 0.28) using model (3). All 
other traits showed no significant decrease in PA with 
lower mixture percentages. Different modelling of miss- 
ing genotypes as used for W6m compared to W6 showed 
almost no difference in PA. 

Importance of individual markers 

As an illustration of the statistical relevance of particular 
markers, ratios of posterior to prior odds based on two 
2.5% mixture models are shown in Table 6. Model (2) 
excludes and Model (3) includes polygenic effects. No 
trait showed markers with an increased evidence for an 



effect using mixture percentages higher than 10%. As an 
example of decreased number of markers showing evi- 
dence for an effect with increasing mixture percentages, 
the estimates for TA are illustrated in Figure 1. This pat- 
tern was found for all eight traits using both model (2) 
and model (3). Note that the number of SNP markers is 
not equal to the number of QTLs; a QTL effect may be 
spread over several markers in a region, whereby each 
individual marker picks up part of the effect of the QTL. 
The table lists the number of markers with substantial 
(3.2 < PPOR < 10), strong (10 < PPOR < 100) or decisive 
(PPOR > 100) effects. Moreover, Figures 2 and 3 show 
Manhattan plots of the PPOR per marker for model (2) 
and model (3), respectively. Generally model (2) detected 
more SNP markers to be associated with QTLs than 
model (3). Based on model (2), the two weight traits and 
TA showed the highest numbers of markers associated 
with QTLs (30 to 57 in total), followed by TF (21 in 
total). The three traits with the lowest heritabilities, FB, 
HC and 175, showed the lowest numbers of markers 
associated with QTLs (7 to 10 in total). 
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Table 3 Estimated variance components and heritabilities for physiological traits 



Trait 1 


Model 2 


a 2 3 

U p 


a 2 


a 2 

u e 


h 2 

" u 


h 2 

" a 




(1) 


212.0 196.7-229.8 


42.3 28.4-56.5 


148.0 131.3-164.1 


0-1 0 0.01-0.19 






(2) 100% 


21 1-1 196.3-227.3 


44.5 32.0-58.1 


152.5 139.8-165.1 




0-07 0.05-0.08 




(2) 10% 


210.6 194.0-225.8 


46.3 32.4-59.8 


154.5 141.2-167.6 




0.05 0.00-0.08 


HC 


(2) 2.5% 


210.5 194.7-226.5 


46.8 33.2-60.9 


155.8 143.7-169.6 




0-04 0.00-0.07 




(3) 100% 


213.7 196.1-229.6 


40.9 27.9-54.8 


145.4 128.9-162.0 


0-08 0.01-0.17 


0.05 0.04-0.06 




(3) 10% 


212.8 196.6-229.8 


41 .6 27.9-55.3 


147.2 130.1-162.7 


0.08 0.00-0.18 


0.03 0.00-0.06 




(3) 2.5% 


212.9 197.0-229.7 


41 .0 27.7-54.7 


146.3 130.1-162.3 


0-1 0 0.00-0.18 


0-02 0.00-0.05 




(D 


806.8 743.4-873.9 


201 .9 150.4-261.2 


475.4 413.6-534.5 


0-16 0.07-0.26 






(2) 100% 


81 1 -5 753.5-876.4 


21 5.4 162.3-272.0 


502.4 461.6-547.2 




0-1 2 0.09-0.14 




(2) 10% 


798.9 737.0-860.1 


225.1 169.7-279.2 


51 7.7 476.1-562.3 




0.07 0.03-0.1 1 


175 


(2) 2.5% 


798.4 736.6-858.0 


231 .0 174.1-285.9 


5 25.4 485.3-569.4 




0-05 0.02-0.08 




(3) 100% 


8 1 4.9 751.4-878.1 


199.6 147.2-257.5 


474.1 419.6-529.8 


0.08 0.02-0.17 


0.09 0.07-0.11 




(3) 10% 


803.6 739.9-867.1 


204.2 148.9-262.2 


487.6 429.5-544.9 


0-08 0.00-0.17 


0.06 0.02-0.10 




(3) 2.5% 


806.4 741.2-869.7 


197.9 143.2-251.3 


474.6 413.5-535.3 


0-1 3 0.03-0.22 


0.04 Q.01-0.07 



1. HC = Hematocrit percentage (observations multiplied by 10" 2 ); 175 = Insulin level (observations multiplied by 10 2 ). 

2. Model: (1) = polygenic; (2) = genomic; (3) = polygenic and genomic; with 100/10/2.5% of the markers allowed to have an effect. 

3. a 2 p : estimates of the total phenotypic variances. 

4. a 2 c : estimates of variances attributed to the cage effect. 

5. o 2 e : estimates of residual variances. 

6. h 2 u : heritability based on the polygenic effect. 

7. h 2 a : proportion of the variance attributed to the genomic effect. 



In contrast to the variance estimates and PA, for which 
treating missing alleles as a separate 3 rd allele did not 



Table 4 Predictive abilities for cross-validation within (W) 
or between (B) families for weight traits 



Trait 1 




W6 


W6m 




W10 


Model 2 


W 3 


B 4 


W 3 


B 4 


W 3 


B 5 


(1) 


0.62 


0.15 


0.62 


0.15 


0.53 


0.19 


(2) 100% 


0.63 


0.24 


0.63 


0.23 


0.57 


0.29 


(2) 70% 


0.65 


0.26 


0.65 


0.26 


0.58 


0.31 


(2) 40% 


0.65 


0.27 


0.65 


0.27 


0.59 


0.32 


(2) 10% 


0.64 


0.24 


0.64 


0.25 


0.58 


0.33 


(2) 7.5% 


0.64 


0.24 


0.64 


0.24 


0.58 


0.33 


(2) 5% 


0.64 


0.22 


0.64 


0.23 


0.57 


0.31 


(2) 2.5% 


0.63 


0.20 


0.63 


0.20 


0.56 


0.31 


(3) 100% 


0.64 


0.25 


0.64 


0.25 


0.58 


0.31 


(3) 70% 


0.65 


0.27 


0.65 


0.27 


0.59 


0.33 


(3) 40% 


0.65 


0.27 


0.65 


0.27 


0.59 


0.34 


(3) 10% 


0.65 


0.27 


0.64 


0.25 


0.59 


0.34 


(3) 7.5% 


0.65 


0.26 


0.64 


0.25 


0.59 


0.34 


(3) 5% 


0.65 


0.25 


0.64 


0.24 


0.58 


0.33 


(3) 2.5% 


0.64 


0.24 


0.63 


0.23 


0.57 


0.31 



1: Trait: W6 = Weight at week 6; W6m = Weight at week 6, missing maker 
genotypes were treated as 3 rd allele; W10 = Weight at week 10. 

2. Model: (1) = polygenic; (2) = genomic; (3) = polygenic and genomic; with 100/ 
10/2.5% of the markers allowed to have an effect. 

3. W: cross-validation within families (all s.e. < 0.01). 

4. B: cross-validation between families (all s.e. < 0.03). 

5. B: cross-validation between families (all s.e. < 0.04). 



change their estimates, the number of markers with 
increased evidence to be associated with a QTL was 
much lower for W6m than for W6. 

Figures 2 and 3 show that SNP markers, indicating the 
presence of QTLs, differed little for models (2) and (3) in 
terms of the location of the QTLs. Some variation was vis- 
ible in the relative weight of markers located closely to- 
gether, since markers situated near a QTL might each pick 
up part of the QTL effect. For example, for trait W6, in 
which a QTL region has been previously detected on 
chromosome 11, model (2) detected two adjacent markers 
in the region with a PPOR of 105 and 14, respectively 
(Figure 2), while the same markers for model (3) showed a 
PPOR of 29 and 42, respectively (Figure 3). Generally, 
model (2) detected more QTLs with decisive evidence ex- 
cept for W10, for which two decisive QTLs were found 
using model 3. 

Discussion 

Heritabilities 

In general, higher heritabilities resulted in an increase in 
PA of genomic selection for all traits. Similar results were 
found for a different set of traits from this dataset [10,13], 
with PA as high as 0.67 for a trait with a high heritability 
(weight, h 2 = 0.74), but as low as 0.27 for a trait with a low 
heritability (body length, h 2 = 0.13). However, the relation- 
ship between heritability and PA was far from linear, as 
can be seen when comparing for example TF and 175, 
where the latter trait had a lower heritability but a higher 
PA using within family cross-validation. This might 
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Table 5 Predictive abilities for cross-validation within (W) or between (B) families for behavioural and physiological 
traits 

Trait 1 TA TF FB HC 175 



Model 2 


W 3 


B 4 


W 5 


B 6 


W 5 


B 6 


W 3 


B 7 


w 3 


B 4 


(1) 


0.37 


0.16 


0.29 


-0.04 


0.21 


0.10 


0.33 


0.08 


0.42 


0.08 


(2) 100% 


0.43 


0.34 


0.31 


0.19 


0.22 


0.11 


0.33 


0.05 


0.42 


0.13 


(2) 70% 


0.43 


0.34 


0.31 


0.18 


0.22 


0.12 


0.33 


0.05 


0.42 


0.13 


(2) 40% 


0.43 


0.35 


0.32 


0.19 


0.22 


0.11 


0.33 


0.05 


0.42 


0.13 


(2) 10% 


0.42 


0.34 


0.33 


0.20 


0.21 


0.11 


0.33 


0.06 


0.42 


0.14 


(2) 7.5% 


0.42 


0.33 


0.32 


0.20 


0.21 


0.11 


0.33 


0.06 


0.42 


0.14 


(2) 5% 


0.41 


0.30 


0.32 


0.19 


0.21 


0.11 


0.33 


0.06 


0.42 


0.13 


(2) 2.5% 


0.40 


0.27 


0.31 


0.18 


0.20 


0.10 


0.33 


0.06 


0.42 


0.12 


(3) 100% 


0.43 


0.33 


0.33 


0.17 


0.22 


0.13 


0.33 


0.06 


0.43 


0.13 


(3) 70% 


0.43 


0.34 


0.33 


0.17 


0.22 


0.12 


0.33 


0.05 


0.43 


0.13 


(3) 40% 


0.43 


0.34 


0.33 


0.17 


0.22 


0.12 


0.33 


0.05 


0.43 


0.13 


(3) 10% 


0.42 


0.33 


0.34 


0.19 


0.22 


0.12 


0.33 


0.06 


0.43 


0.13 


(3) 7.5% 


0.42 


0.32 


0.34 


0.18 


0.22 


0.12 


0.33 


0.07 


0.43 


0.13 


(3) 5% 


0.42 


0.30 


0.34 


0.17 


0.22 


0.12 


0.33 


0.07 


0.43 


0.14 


(3) 2.5% 


0.41 


0.28 


0.33 


0.16 


0.21 


0.12 


0.33 


0.08 


0.43 


0.12 



1. Trait: TA = Total activity in open field test; TF = Time freezing during cue; FB = Fecal boli after cue; HC = Hematocrit percentage; 175 = Insulin level. 

2. Model: (1) = polygenic; (2) = genomic; (3) = polygenic and genomic; with 100/10/2.5% of the markers allowed to have an effect. 

3. W: cross-validation within families (all s.e. < 0.01). 

4. B: cross-validation between families (all s.e. < 0.03). 

5. W: cross-validation within families (all s.e. < 0.02). 

6. B: cross-validation between families (all s.e. < 0.04). 

7. B: cross-validation between families (all s.e. < 0.02). 



indicate that other factors besides the heritability have an 
influence on PA of a model 

QTL and individual marker distribution 

In addition to heritability, the influence of the number of 
QTLs on PA of a trait was also investigated. As pointed 
out earlier, the number of SNP markers to be associated 

Table 6 Number of markers associated with QTLs 
classified by levels of evidence using 2.5% mixture model 

Model (2) Model (3) 



Trait 1 Substantial 2 Strong 3 Decisive 4 Substantial 2 Strong 3 Decisive 4 



W6 


26 


3 


1 


12 


6 


0 


W6m 


5 


5 


0 


1 


5 


0 


W10 


31 


18 


0 


24 


6 


2 


TA 


41 


13 


3 


39 


8 


0 


TF 


17 


4 


0 


13 


3 


0 


FB 


5 


2 


0 


3 


1 


0 


HC 


9 


1 


0 


4 


0 


0 


175 


6 


2 


0 


1 


2 


0 



1. Trait: W6 = Weight at week 6; W6m = Weight at week 6, missing marker 
genotypes were treated as 3 rd allele; W10 = Weight at week 10; TA = Total 
activity in open field test; TF = Time freezing during cue; FB = Fecal boli after 
cue; HC = Hematocrit percentage; 175 = Insulin level. 

2. Substantial: Changes in odds from prior to posterior probability between 
>3.2 and <10. 

3. Strong: Changes in odds from prior to posterior probability between >10 
and <100. 

4. Decisive: Changes in odds from prior to posterior probability >100. 



with QTLs is higher than the number of QTLs so that 
we found more markers to have an substantial effect on 
QTLs than found by Valdar et al [14]. Across traits, the 
number of markers associated with QTLs depended par- 
tially on the heritability, but especially for traits with low 
to moderate heritabilities the number of markers asso- 
ciated with QTLs varied substantially between traits with 
similar heritabilities. 

There was a clear tendency for traits with fewer SNP 
markers associated with QTLs to have a lower PA in the 
case of between family cross-validation. The only excep- 
tion was HC, which had the lowest PA but not the low- 
est total number of markers associated with QTLs. 
However, this trait had a relatively high number of mar- 
kers classified with the lowest levels of evidence for 
QTLs compared to the other traits, which indicate their 
low effect size. For within family cross-validation the 
tendency was weaker. 

Simulation studies, e.g. by Zhong et al. [15], Kizilkaya 
et al. [16] and Meuwissen and Goddard [17], have shown 
that the number of QTLs affecting a trait influences the 
performance of genomic selection, though the influence 
differed depending on the methodology that was used to 
estimate genomic breeding values. Kizilkaya et al. [16] 
found that for a given amount of genetic variance, an in- 
crease in the number of QTLs affecting a trait, and 
thereby a reduction of the variance attributed to a single 
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Figure 1 Marker associations with QTLs based on model (2) 
using different mixture percentages. Distribution of the number 
of SNP markers showing substantial, strong or decisive evidence to 
be associated with QTLs of the trait total activity in open field test 
(TA). Changes in odds from prior to posterior probability (PPOR) of 
3.2 < PPOR < 1 0 denotes substantial evidence, 1 0 < PPOR < 1 00 
strong evidence and PPOR> 100 decisive evidence. 



QTL, led to a decrease in correlations between true and 
predicted genotype in both purebred (from 0.39 to 0.20) 
and multi-breed (from 0.42 to 0.30) situations. Assuming 
the availability of whole-genome sequence data, Meuwis- 
sen and Goddard [17] found that the accuracy of the 
predicted total genetic value using Bayesian methodology 
was higher in a scenario simulating three causative QTL 
per chromosome compared to that simulating 30 QTL. 
They suggested that the lower accuracy in the presence 
of more QTL may be caused by the fact that each QTL 
was associated with a smaller effect and therefore harder 
to detect and estimate accurately. For W6m, treating 
missing SNPs as a 3 rd allele reduced the number of 
detected SNPs associated with QTLs. This reduction had 
no influence on PA. 

Behavioural traits versus weight traits and physiological 
traits 

Analysis of variance components based on models (2) 
and (3) indicated that behavioural traits showed in gen- 
eral much lower variability attributable to cage effects. 
The larger variability of cage effects for weight and 
physiological traits was also found by Valdar et al. [11] 
and various reasons, such as the more automated 
process used to record behavioural phenotypes, were dis- 
cussed. Behavioural traits are generally difficult to collect 
in large quantities and difficult to measure directly, and 
therefore require suitable proxy traits. 

Cross-validation 

There is a vast statistical literature on model comparison 
criteria using Bayesian and frequentist perspectives. In 
this work we focus on the use of genomic models to 



predict genetic values of individuals or to predict future 
observations. In such a context an obvious criterion of 
model comparison is their predictive ability. This was 
studied using cross-validation. Using this method, all 
three models performed equally well using within family 
cross-validation. Extensive pedigree information reduced 
the advantage of genomic information which provided 
only a small benefit relative to polygenic selection. How- 
ever, with less close family ties, as is the case with be- 
tween family cross-validation, genomic information 
became substantially more valuable, in agreement with 
other studies [13,18]. This effect was to some extent 
dependent on a number of factors discussed before, 
namely the heritability and number of QTLs affecting 
the trait. For FB and HC, two traits with low heritabil- 
ities and a small number of QTLs, genomic selection did 
not lead to an increase in PA. This indicates that a larger 
reference population is necessary for these traits to ob- 
tain more accurate inferences of genomic values as dis- 
cussed by Goddard and Hayes [19]. For TF, a trait with 
moderate heritability and despite a low number of QTLs, 
genomic information led to a substantial increase in PA 
using between family cross-validation. For 175, a trait 
affected by a relatively large number QTLs, but with low 
heritability, inclusion of genomic information led to a 
moderate increase in PA when between family cross-val- 
idation was used. 

Inclusion of polygenic effects 

Adding polygenic effects to a genomic model influenced 
the estimated variances by picking up the part of the 
genetic variance that was not captured by the genomic 
effects model. However it had little influence on PA. 
Legarra et al. [13] and De los Campos et al. [20] used the 
same dataset and found an increased PA using genomic 
information relative to polygenic information, but little 
difference between a solely genomic model and a com- 
bined genomic-polygenic model. A simulation study 
showed slight increases of accuracy when adding poly- 
genic effects to the genomic model, but this was 
dependent on the extent of linkage disequilibrium be- 
tween adjacent markers [21]. The same study also 
showed that a genomic model underestimates genetic 
variance, but that this is improved by adding a polygenic 
component, as was the case in this study. 

Influence of proportion of markers 

A reduced number of markers assumed to have a sub- 
stantial effect on the trait had an influence on estimates 
of variance but had no significant effect on PA for most 
traits. Mixture models explained less of the variance 
attributed to genomic effects, but resulted in better esti- 
mates of individual SNP effects. As a consequence, the 
PAs between models differed little. Within trait, there 



Kapell et al. BMC Genetics 2012, 13:42 
http://www.biomedcentral.eom/1 471 -21 56/1 3/42 



Page 8 of 13 



100 



o 



10 

3.2 
0 



o 

Dh 
Oh 



3.2 
0 



o 

0* 10 





• 

• 


• 




II 

1 5 


i i i i i i i i i i i i i 1 1 

10 15 X 
Chromosome 

(a) W6 


« 

— : i 


• 

♦ ♦ • 










' 


i 



o 

Dh 



10 

3.2 
0 



10 

Chromosome 
(b) W6m 



I l l l l l l l l 
15 X 



O 

Dh 



100 
0 



10 

Chromosome 
(c) W10 



15 



10 

Chromosome 
(d)TA 



i i i I I I I I I 
15 X 




(e) TF 



10 

Chromosome 
(f)FB 



i i i i i i i i 
15 X 





(g) HC 



5 10 15 
Chromosome 

(h) 175 



Figure 2 Marker associations with QTLs based on model (2) using a 2.5% mixture model. A. Weight at week 6 (W6), B. Weight at week 6 
considering missing marker genotypes (W6m), C. Weight at week 10 (W10), D. Total activity in open field test (TA), E. Time freezing during cue 
(TF), F. Fecal boli after cue (FB), G. Hematocrit percentage (HQ, H. Insulin level (I75). Changes in odds from prior to posterior probability (PPOR) of 
3.2 < PPOR < 1 0 denotes substantial evidence, 1 0 < PPOR < 1 00 strong evidence and PPOR > 1 00 decisive evidence. 



was a clear relationship between the mixture percentage 
and the number of SNP markers associated with QTLs, 
but no clear association with the PA. TA was the only 
trait to show a significant decrease in PA for within as 
well as between family cross-validation, but only in cases 
where the mixture proportion dropped below 7.5%. 
Weight traits showed almost the same trends for some 
mixture models, but for most traits no change in PA oc- 
curred even at a mixture proportion of 2.5%. Even 
though estimates for PA were not significantly different 
from each other, there seemed to be an optimum 



mixture percentage, with highest values obtained often 
around mixtures of 40%. 

Su et al. [22] found similar results in dairy cattle when 
looking at the squared correlation between true and pre- 
dicted breeding values in bulls, across a range of percen- 
tages of mixtures and traits. Reducing the percentages 
eventually led to lower correlations, but, depending on the 
trait, the decline was small and did not appear until per- 
centages were below 20% (e.g. in the trait fat percentage in 
milk). In traits affected by a small number of QTLs with a 
large effect each, a larger part of the variance is accounted 
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Figure 3 Marker associations with QTLs based on model (3) using a 2.5% mixture model A. Weight at week 6 (W6), B. Weight at week 6 
considering missing marker genotypes (W6m), C. Weight at week 10 (W10), D. Total activity in open field test (TA), E. Time freezing during cue 
(TF), F. Fecal boli after cue (FB), G. Hematocrit percentage (HQ, H. Insulin level (I75). Changes in odds from prior to posterior probability (PPOR) of 
3.2 < PPOR < 1 0 denotes substantial evidence, 1 0 < PPOR < 1 00 strong evidence and PPOR > 1 00 decisive evidence. 



for by these QTLs. Reducing the proportion of SNPs might 
lead to an even higher proportion of variance explained by 
these QTLs and a more skewed distribution of SNP-effect 
size, as was shown by Su et al. [22, Figure 2]. In contrast, 
in traits not affected by QTLs of large effect, the variance 
is shared more uniformly among all available SNPs. Similar 
to this link between SNP-effect size and mixture percent- 
age, a larger number of markers showing a high PPOR and 
a more skewed distribution of PPOR was found in the 
present study when the mixture percentage was reduced. 



The relationship between mixture percentage and PPOR 
or SNP-effect size may be a reason for a slightly higher PA 
when the variance is distributed more evenly, which could 
be seen when comparing traits with more QTLs (e.g. TA) 
to traits with few QTLs (e.g. FB). 

Due to the large costs of genotyping, low density SNP 
arrays or methodologies that reduce the numbers of animals 
to be genotyped are of great importance. Research in gen- 
ome-wide association studies has found that a two-stage de- 
sign with pre-selection of SNPs between steps can reduce 



Kapell et al. BMC Genetics 2012, 13:42 
http://www.biomedcentral.eom/1 471 -21 56/1 3/42 



Page 10 of 13 



costs substantially without reducing the power of the study 
[8,23]. Another strategy is the use of imputation of haplo- 
types or of missing genotypes, for example long-range phas- 
ing [24,25]. Our results indicate that, depending on trait 
characteristics such as heritability and number of QTLs 
involved, an optimum mixture percentage, i.e. an optimum 
number of SNPs considered to have a substantial effect, 
may exist. This indicated that a pre-selected, optimal subset 
of SNPs could be used for genomic selection of specific 
traits, where high efficiency is combined with lower finan- 
cial costs. However, breeding programmes involve simultan- 
eous selection of many traits, and depending on the degree 
of overlap of the selected SNP markers, the total number of 
selected SNPs may be considerably larger than the number 
of SNPs selected for a single trait. 

Conclusions 

Genomic selection generally performed better than trad- 
itional polygenic selection, as indicated by an increase in 
PA. The increase in PA was most pronounced in the case of 
between family cross-validation. Larger increases in PA 
were found for traits with lower heritabilities, but the 
underlying number of QTLs affecting the trait had an im- 
portant effect. Traits with a small number of QTLs showed 
lower PA using the genomic model compared to the poly- 
genic model. Behavioural traits showed a lower variance of 
cage effects than other traits, but no difference in efficiency 
of genomic selection compared to traits with a similar herit- 
ability. Models including both polygenic and genomic 
effects captured more of the genetic variance, but did not 
improve PA. The dataset was restricted to genotyped ani- 
mals only; incorporation of non-genotyped animals may 
show different results as a result of for example lower errors 
of estimation of fixed effects and higher accuracy of predic- 
tion of polygenic effects [26]. 

Reducing the number of SNP markers assumed to have 
a substantial effect in a mixture model did not signifi- 
cantly change PA for most traits, particularly in the con- 
text of within family cross-validation. The mixture 
approach showed that models using different percentages 
of SNPs affecting the trait performed efficiently even 
with low percentages, which may be of greater import- 
ance in the future with increasing sizes of SNP arrays. 

In the present work, the a priori probability that a 
marker effect has a detectable effect was treated as a 
known parameter. In common with other results from 
the literature, this did not have a clear effect on the PA 
of the models. However as shown in Figure 1, the a 
priori probability influences the number of detectable 
markers a posteriori. Therefore when focus is on detec- 
tion, it would be desirable to infer the probability of mar- 
kers with detectable effects from the data. Recently, 
Bayesian implementations of such methods have been 
developed [27]. 



Methods 

Animals 

Data on 2,188 geno- and phenotyped mice provided by 
the Wellcome Trust Centre for Human Genetics were 
used to analyse the efficiency of genomic selection in 
seven traits. The data were freely available [28] and the 
care and use of animals were performed in compliance 
with the guidelines at the Wellcome Trust Centre for 
Human Genetics, University of Oxford, UK. The popula- 
tion has already been described and analyzed compre- 
hensively in various papers including Solberg et al. [29] 
and Valdar et al. [11]. Therefore, only the aspects im- 
portant for the present analysis will be highlighted here. 
Animals were obtained from crossing eight purebred 
mice strains, followed by 50 generations of pseudo-ran- 
dom mating. Data comprised of 175 full-sib families 
belonging to one generation and were collected over a 
period of three years, with a pedigree that consisted of 
parents and grandparents (2,890 animals in total). Par- 
ents and grandparents had no phenotypic records. 

Single nucleotide polymorphism markers 

After removing uninformative markers, 10,496 SNPs 
were retained for the analysis. All animals had a call rate 
above 95% and 99% of all SNPs had call rates higher than 
99%. Missing alleles were imputed at random based on 
the Hardy- Weinberg equilibrium conditional on the 
observed allelic frequencies of genotyped SNPs. The ran- 
dom numbers were generated based on a uniform distri- 
bution. The extent of linkage disequilibrium between 
pairs of markers was low with an r 2 < 0.5 within 2 Mb 
and < 0.2 within 8 Mb [14] . 

Phenotypic data 

Traits were chosen across a range of heritabilities, type 
(weight, behavioural or physiological) and number of QTLs 
(Table 7), based on Valdar et al. [11, 14, suppl.]. Weight 
traits included body weight at the start of the test at six 
weeks of age (W6) and body weight at the end of the test at 
ten weeks of age (W10). Behavioural traits included three 
measurements. One measurement was recorded as part of 
an open field test (a model of anxiety) at six weeks of age, 
namely total activity, measured as distance travelled in a 
time span of five minutes (TA). Two measurements were 
recorded as part of a cue conditioning test at seven weeks 
of age, whereby freezing to a tone after association with a 
foot shock was measured: time spent freezing during cue in 
minutes (TF) and number of fecal boli after cue (FB). 
Physiological traits were hematocrit percentage in blood as 
part of a full blood count test (HC) and insulin level at 75 
minutes after intraperitoneal injection with glucose dose as 
part of a test to model type 2 diabetes mellitus, at nine 
weeks of age (175). For further information regarding the 
biology behind these traits we refer to Solberg et al. [29]. 
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Table 7 Description of the traits used in the genetic analyses 



Trait 
1 l all 


Type 


Count 


h 2;1 
11 


OTI 1 


T 2 
1 


Weight at week 6 (W6) 


Weight 


1916 


0.74 


19 


x ,/3 


Weight at week 10 (W10) 


Weight 


1880 


0.62 


20 


x 1/3 


Total activity in open field test (TA) 


Behavioural 


1879 


0.34 


16 


X 


Time freezing during cue (TF) 


Behavioural 


1389 


0.31 


1 


X 


Fecal boli after cue (FB) 


Behavioural 


1511 


0.10 


2 


(x + 1) 1/2 


Hematocrit percentage (HQ 


Physiological 


1578 


0.11 


1 


x 3 


Insulin level 3 (I75) 


Physiological 


1701 


0.13 


10 


x 1/3 



1. h 2 ; QTL: reported by Valdar et al. [11,14, suppl.]. 

2. T: transformation used for the trait. 

3. Insulin level: measured at 75 minutes after injection of glucose. 



These traits were normalized using the transformation 
given in Valdar et al [11] and subsequently multiplied or 
divided by appropriate factors to avoid rounding errors in 
the multi-marker programme. To investigate the influence 
of low frequencies of missing SNPs, weight at 6 weeks was 
analysed with missing values for SNPs treated as a separate 
3 rd allele with low frequency (W6m). 

Statistical analysis 

All traits were treated as normally distributed and analyzed 
incorporating fixed effects and covariates based on the 
models reported by Valdar et al [11]. Fixed effects were sex 
(W6, W6m, W10, TA, FB, HC, 175), year-month (W6, 
W6m, W10, TA), parity (W6, 175), experimenter (TA, 175), 
apparatus (TF) and month (175); covariates comprised cage 
density (W6, W6m, W10, 175), age in days (W6, W6m, 
W10) and weight (HC, 175). Cage was added as a random 
effect for all traits. Cages consisted almost solely of animals 
from one family. For all practical purposes cage was nested 
within family (avg. 3.1 cages per family). 

Three basic groups of models were used to compare 
changes in variance components and PA as a result of 
using genomic information. One model used only poly- 
genic effects (1), a second model used only genomic effects 
(2), and a third model fitted both effects (3). For models 
(2) and (3), seven different sub-models were considered 
based on the percentage of markers that was assumed to 
have a substantial effect. This included a non-mixture 
model using 100% and six mixture models, ranging from 
70%, 40%, 10%, 7.5%, 5% to 2.5% of the SNPs assumed to 
have a substantial effect. In the following, these sub-mod- 
els will be labelled according to the mixture percentages. 
All analyses were performed using a Bayesian approach 
and implemented with Markov chain Monte Carlo meth- 
ods [30] using the programme iBay [31]. The basic model 
using polygenic effects can be described as follows: 

y = \a + X x b + X 2 c + Zu + e, (1) 

where ^ fits a general mean and the vectors b, c, u 
and e fit the fixed, cage (c~ N(0, 1(T 2 C )) > polygenic 



(w~7V(0, Aa 2 u )) and residual effects (e~ N(0, Ia 2 e ) , 
respectively. / is the identity matrix and A the additive 
genetic relationship matrix. X h X 2 and Z are inci- 
dence matrices relating the vectors b, c and u with y. 
This is the mixed model which is most commonly 
used to predict traditional breeding values in animal 
breeding programmes. For the model using genomic 
effects, model (1) was changed to a Bayesian multi- 
marker association model as follows: 

y = p + Xib + X 2 c + Qas + e, (2) 

where Qas fits the genomic effects, with a the vector 
representing effects associated with marker alleles 
(<z~iV(0, 1)), 5 a scaling factor modelling the variance 
explained by each marker and Q the design matrix 
linking alleles with markers [31]. Priors were assigned 
to the scaling factor s as follows for the non-mixture 
models: 

s~7W> 0 (0,o^), 

where o 2 g can be interpreted approximately as the 
expected average fitted variance per marker and TN 
denotes a truncated normal distribution. For mixture 
models the following scaling factors s were used: 

J N (0, (T 2 g o) with probability tt 0 

\ 2W >0 (0, o- 2 g i) with probability it\ = 1 — tt 0 

where the first distribution models the markers with on 
average little to no effect at a proportion ji 0 , and the second 
distribution models the markers that have a substantial ef- 
fect at a proportion n h The proportions of markers 7ii were 
varied across mixture models ranging from 100 to 2.5%. 
Variances for the first distribution (o~ 2 g o) were set to 1% of 
the phenotypic variance of the trait divided by the number 
of markers. No polygenic effects were present and all other 
effects were as described for model (1). Using the method- 
ology of genomic selection as described by Meuwissen et al. 
[1], it was possible to solve models with more markers than 
phenotypic records. The last model, which combined both 
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genetics effects of model (1) and (2), can be as described as 
follows: 

y = ft + Xib + X 2 c + Qas + Zu + e, (3) 

where the effects are as defined earlier. Here the polygenic 
variance of u accounts for genetic variation which could not 
be explained by the genomic markers a. 

Variance components 

Estimates for the variance of polygenic effects (<r 2 M ), vari- 
ance of genomic effects {o 2 a ) , cage variance (cr 2 c ), re- 
sidual variance (a 2 e ) and total phenotypic variance ((T 2 P ) 
were calculated using information from all animals that 
had both genomic and phenotypic information. The vari- 
ance of genomic effects a 2 a is calculated as the sum of 
the contributions to the genetic variance from each mar- 
ker, plus all possible covariances due to linkage disequi- 
librium, taking into account the allele frequencies. The 
heritabilities for the polygenic effects (h 2 u ) and genomic 
effects (h 2 a ) were calculated based on their correspond- 
ing variance components (a 2 M anda 2 ^ , respectively) as 
proportion of the phenotypic variance. The software iBay 
required that animals had both genomic and phenotypic 
data available to be included in the analysis. 

Predictive ability 

PA was calculated as the Pearsons correlation between a 
predicted observation and the corresponding realized ob- 
servation. Realized observation was calculated as the 
phenotype corrected for fixed effects and covariates, 
while the predicted observation was the estimated breed- 
ing value, as was done by Legarra et al. [13]. To predict 
these observations, a cross validation approach was used, 
whereby the dataset was split into a validation set and a 
training set. The validation set contained the animals for 
which the observation had to be predicted, while the 
training set was used to estimate the parameters for the 
prediction model. Size of the training set is of import- 
ance for the estimation of accurate breeding values [19] 
and to ensure a sufficient size of training population, a 
1:5 proportion of validation to training dataset was used. 
Only animals from families with at least two members 
were used to create validation sets (~ 80% of all animals). 
These animals were randomly split into five groups to 
create five validation sets. Thus each validation set con- 
tained -16% of all animals. This was repeated to create 
ten validation sets in total. Each validation set had a cor- 
responding training set, which contained the remaining 
animals with phenotypic data. 

Two different routines for splitting the data were used: 
within family and between family cross-validation. For 
within family cross-validation, full sib families were ran- 
domly split between training and validation set such that 



each set contained at least one animal from a family. For 
between family cross-validation, families were split such 
that no full sib family would have animals in both data- 
sets simultaneously. As a result, for between family 
cross-validation no close genetic connectedness due to 
full sib families was available between training and valid- 
ation data. In the case of within family cross-validation, 
full sibs with phenotypic data linked the breeding values 
of the training and validation data. 

Importance of individual markers 

As an illustration, the relative importance of individual 
markers was quantified via the computation of Bayes 
Factors, conditional on either model (2) or model (3). 
The correct inferences about the statistical relevance of 
particular markers could involve, first, calculation of the 
posterior probability of each model. Secondly one could 
report Bayes factors conditional on the model with lar- 
gest posterior probability, or averaging over all models. 
This task was judged to be computationally too burden- 
some and was not undertaken in this study. As indicated 
in Table 7, traits were chosen across a range of number 
of QTLs, ranging from as low as 1 for TF and HC up to 
20 for W10. The objective was to compare the perform- 
ance of genomic models (2) and (3) in finding regions 
with evidence of a marker having an increased effect, 
and to study how the number of QTLs affecting a trait 
influences the efficiency of genomic selection. Using the 
Bayesian approach implemented in the programme iBay 
[31], the Bayes Factor computed as the change in prior 
to posterior odds (PPOR) for each marker was calculated 
with the following formula: 

PPOR=(p 1 /(l-^ 1 ))/("i/l-"i), 

where p 1 is the estimate for the posterior probability of 
the marker having a substantial effect, and n 1 the a priori 
probability that the marker has a substantial effect. 
Results were plotted per trait for all markers, whereby a 
PPOR > 3.2 can be interpreted as substantial evidence 
for the marker to have an increased effect, a PPOR > 10 
as strong evidence, and a PPOR > 100 as decisive [31]. 
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